1 Overview

2 Installing Needed Packages

your caption

your caption

3 Lab Exercise Questions

3.0.0.0.0.1 From the P2P tutorial

3.1 Install ggplot2

The ggplot2 package is a system for data visualization. More information can be found ggplot2 documentation First, we need to install the package and call it.

#install.packages("ggplot2")
library(ggplot2)

3.2 Load the dataset: iris

We will use the Iris flower data set to explore data visualization with ggplot2. The Iris flower data set is a multivariate data set widely used for statistical and machine learning analysis. Iris flower data set wikipedia

3.2.1 About Iris

  • The data set contains 50 samples of each of 3 species: Iris setosa, Iris versicolor, and Iris virginica.
  • Each sample was measured for 4 features: the length and width of the sepals and the petals, in centimeters.
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

3.3 ggplot makes nicer plots

plot(x=iris$Sepal.Length, y=iris$Sepal.Width, 
     xlab="Sepal Length", ylab="Sepal Width",  main="Sepal Length-Width")

scatter <- ggplot(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) 
scatter + geom_point(aes(color=Species, shape=Species)) +
  xlab("Sepal Length") +  ylab("Sepal Width") +
  ggtitle("Sepal Length-Width") + theme_minimal() +theme(plot.background = element_rect(fill = "lightblue"))

3.4 ggplot2 basics

To make a graph, we need 3 elements: - a data set - a coordinate system - geoms: visual marks that represent data points, including x, y, size, color, …

3.4.1 First, we initiate the graph

3.4.2 Second, we add a geom layer

ggplot(data=iris,aes(x=Sepal.Length,y=Sepal.Width))+geom_point()

3.4.3 Then we assign different colors and shapes to the data points of different species.

ggplot(data=iris,aes(x=Sepal.Length,y=Sepal.Width))+geom_point(aes(color=Species, shape=Species))

3.4.4 We label the x and y axis and set a title.

ggplot(data=iris,aes(x=Sepal.Length,y=Sepal.Width))+geom_point(aes(color=Species, shape=Species))+
  xlab("Sepal Length") +  ylab("Sepal Width") +
  ggtitle("Sepal Length-Width")

3.4.5 We can utilize the built-in themes to change the style of the background and grid

ggplot(data=iris,aes(x=Sepal.Length,y=Sepal.Width))+geom_point(aes(color=Species, shape=Species))+
  xlab("Sepal Length") +  ylab("Sepal Width") +
  ggtitle("Sepal Length-Width") + theme_minimal()

3.4.6 We can also customize the theme

ggplot(data=iris,aes(x=Sepal.Length,y=Sepal.Width))+geom_point(aes(color=Species, shape=Species))+
  xlab("Sepal Length") +  ylab("Sepal Width") +
  ggtitle("Sepal Length-Width") + theme_minimal() + theme(plot.background = element_rect(fill = "lightblue"))

3.4.8 We can also pick a different color palette

ggplot(data=iris,aes(x=Sepal.Length,y=Sepal.Width))+geom_point(aes(color=Species, shape=Species))+
  xlab("Sepal Length") +  ylab("Sepal Width") +
  ggtitle("Sepal Length-Width") + theme_dark() +scale_colour_brewer(palette = "Pastel2")

3.5 ggplot + statistics

3.5.2 Let’s try again with polynomial smoothing

ggplot(data=iris,aes(x=Sepal.Length,y=Sepal.Width,color=Species))+geom_point(aes( shape=Species))+
  xlab("Sepal Length") +  ylab("Sepal Width") +
  ggtitle("Sepal Length-Width") + theme_minimal() + geom_smooth(method="loess")
## `geom_smooth()` using formula 'y ~ x'

3.6 Facet function helps to generate multiple plots in one figure

ggplot(data=iris,aes(x=Sepal.Length,y=Sepal.Width,color=Species))+geom_point(aes( shape=Species))+
  xlab("Sepal Length") +  ylab("Sepal Width") +
  ggtitle("Sepal Length-Width") + theme_minimal() + geom_smooth(method="loess") + facet_grid(Species ~ .)
## `geom_smooth()` using formula 'y ~ x'

ggplot(data=iris,aes(x=Sepal.Length,y=Sepal.Width,color=Species))+geom_point(aes( shape=Species))+
  xlab("Sepal Length") +  ylab("Sepal Width") +
  ggtitle("Sepal Length-Width") + theme_minimal() + geom_smooth(method="loess") + facet_grid(. ~ Species)
## `geom_smooth()` using formula 'y ~ x'

3.7 Last but not least, we should save our figure

ggsave("plot.png") #saves the latest plot you make
## Saving 7 x 5 in image
## `geom_smooth()` using formula 'y ~ x'
#or if you have multiple plots and you want to save one of them: use the variable to save it
plot=ggplot(data=iris,aes(x=Sepal.Length,y=Sepal.Width,color=Species))+geom_point(aes( shape=Species))+
  xlab("Sepal Length") +  ylab("Sepal Width") +
  ggtitle("Sepal Length-Width") + theme_bw() + geom_smooth(method="loess") + facet_grid(. ~ Species)
ggsave("plot.png",plot = plot)
## Saving 7 x 5 in image
## `geom_smooth()` using formula 'y ~ x'